fCite - fractional citation tool

Frequently Asked Questions

Why was fCite created in the first place?

fCite was created to make the assessment of an individual's scientific research output easier and fairer. It should also allow for the allocation of limited funds to the people who have truly contributed to the development of the science.

Why does allocating scientific contributions matter?

We strongly believe that quality not quantity should matter. Therefore, even a single but highly cited publication should matter more than a huge number of uncited publications, regardless of the journal in which the papers appeared. Citations reflect the utility of research, and even though some citations can be negative (reflecting disagreement with the presented result), it is unlikely that erroneous work will receive more than just a few citations. In most cases, scientists refer to past works when there is a need to do so, to indicate past contributions on which their ideas were built or to indicate the methodology used. In short, nanos gigantum humeris insidentes.

Consequently, science should not be a "rat race" in which scientists worry only how to publish yet another meaningless paper. Science is not about co-authoring hundreds of publications or increasing one’s H-index. Science is about developing the ideas or even a single idea that change the world. Hopefully, fCite will facilitate changing "Publish or Perish" which today can be translated into "Publish (many) or Perish" into "Publish (quality) or Perish".

Practical remark: Let's consider a situation when two publications had been published in the same journal. Both of them after 3 years received 100 citations (by the way such score is already remarkable, 99^th percentile). The publications describe the years of efforts of two groups. But now you see that in the first publication there are 3 authors, while in the second publication there are 100 authors. The key question is now who did better work? If you would have $1 million to spend as the grant agency whose future ideas would you support? Or the even worse question, would that one million be enough to support 2-3 years of the research of 100 scientists including buying necessary equipment? Let us consider a situation in which two publications have been published in the same journal. Both of them received 100 citations after 3 years (such a score is already remarkable, being in the 99^th percentile). The publications reflect years of effort by the two groups. However, now you see that the first publication has 3 authors, while the second publication has 100 authors. The key question is now who did better work? If you had $1 million to spend as a grant agency, whose future ideas would you support? Alternatively, an even worse question, would that one million dollars be enough to support 2-3 years of research by 100 scientists, including buying necessary equipment?

Citations are sometimes compared to the currency. (Un)fortunately, scientists, being very innovative, have devised a number of ways to cheat the system and improve bibliometric scores (see also Goodhart's law). In a nutshell, if citations are the currency of science, the current situation is beginning to resemble Zimbabwe's hyperinflation. Every scientist receives all the credit (the citations) regardless of how much he/she contributed to the work. In the above example, the first team that received 100 citations created 300 citations for their authors, while the second team can take pride in having created 10,000 citations out of thin air. Moreover, citations are not always equal. Often, authors cite themselves, which may or may not be justified (referring to related work in the past vs referring to multiple past publications only to increase the citation count). Therefore, when we analyse in greater detail the three author publication, we can be almost certain that receiving 100 citations in 3 years means something. It is highly unlikely that all those citations were self-citations. The contrary is true for the second publication. Even if every scientist self-cited this publication within three years only once, it would be enough to reach 100 citations (some bibliometric tools allow one to filter out such citations; this key feature will be introduced into fCite in the near future).

Will fCite hamper the collaborative spirit of science?

The first argument that will be raised against fCite will be that the service will impede collaboration between research groups. We believe that this is completely untrue. We, as scientists, collaborate with each other when there is a mutual benefit (someone has unique expertise, someone has access and can run complicated equipment, someone else brings his/her theory, etc.). Scientists do not collaborate to make only a small contribution and be on the author list of a paper to return the favour in the future (or at least they should not). In fact, fCite will encourage effort to maintain important collaborations in which you invest your limited time. It will allow us to focus on what is important.

How is the start of a researcher's career defined?

In the output of fCite the very first thing you will see is the period within with the author published (e.g., "Published between 2010-2017 (8 years)"). As the beginning of the researcher’s career, we consider the year when he/she published the first publication (regardless of his/her position on the author list). This should be considered a very rough approximation, as in some fields or research groups, even bachelor’s students co-author publications if their input was significant. In other situations, the first publication can appear at the end of the PhD or even a few years after (additionally, a delay between submitting the manuscript and acceptance is on average 9 months, and it sometimes can be counted in years). As you see, this difference may represent as much as ~7-10 years of research experience. Nevertheless, if someone co-authored a publication, this is considered the start of his/her research career. Although the starting year can be considered a minor parameter, it actually has paramount value. It can affect practically all bibliometric metrics. Taking into account the number of years someone is doing science allows us to compare the researcher to his/her peers in a fairer way.

Practical note: two researchers with a large difference in publishing span should not be compared directly even if their publishing span has been adjusted to make them equal (e.g., considering only the last 5 years). Bear in mind that cumulative scores such as the total number of citations and the total RCR do not accumulate linearly. The more time that a researcher is doing science, the faster he/she can accumulate citations. In other words, the more respected you are in your field, the easier it is to publish your next paper (e.g., it is less likely that you will get a "desk refection"). Moreover, a senior researcher is most likely already the supervisor of a research group (which can be of various sizes, even >50 people), while a younger researcher may just be beginning to assemble a group or be a PhD student or postdoc. Thus, it is unrealistic to expect that a junior scientist will have similar publication output as a senior colleague (at least not in the quantity). In such situations, it is advisable to check average scores (e.g., average article score, contribution to a single paper).

How should one interpret the number of publications and why should they be divided into sole, first, last and middle authored items?

One of the most important (and quantitative) results of scientific endeavour is publication. It describes the discovery or the work. It also gives credit to the author(s). Therefore, authors who have contributed to many works are considered better than those who have not. Additionally, in many fields of science, the order of the authors is not random and specifies the contribution to the discovery (see the description of the FLAE, FLAE2 and FLAE3 models for more details). When there is only one author, the contribution is unquestionable, and the situation is simple (even that is not entirely true, such as in the citation count). Otherwise, it becomes difficult to capture the importance of the research output if we need to read/analyse dozens of papers in which some are first author papers while others are middle author papers (usually, this is the time when most people, whether consciously or not, begin to make short-cuts and turn to the impact factors, the H-index, and total citation counts, reading only the first or last authorship papers, etc.). Finally, each publication has a different number of citations, and it is published in journals of varying quality (consider the miserable practice of using the impact factors of journals to assess the quality of a single paper/author).

Nevertheless, if FLAE models apply in a given field, it is useful to check how many papers of a given type are in a portfolio. An example output for this section can look as follows:

Author: John Smith (282 articles)
Single authorship papers:    11/282 ( 3.90)
First authorship papers:     33/282 (11.70)
Last authorship papers:      98/282 (34.75)
Middle authorship papers:   140/282 (49.65)
Avg number of papers per year:  11.28

H-index 38, M-index 1.9, fH-index 15, fM-index 0.75                        
Avg number of papers per year: 14.10

Even such simple statistics convey very useful information. In the given example, it is evident that the researcher is a successful senior scientist, active for 20 years in the science, who has co-authored many publications. You can almost be sure that she/he supervises a large group of junior researchers and/or collaborates with many research groups (otherwise, it would not be possible to co-author over 10 papers per year; additionally, this last parameter can be used to make a rough estimate of her/his group size, which is approximately 2x larger than the average number of last author papers in last 2-5 years). Moreover, fCite makes it possible to filter out non-research papers.

What we can learn from the average number of papers per year?

As stated in the previous point, the average number of papers per year is a good indicator of group size (if the FLAE model applies, usually the number of (last author) papers over the past 2-5 years will reflect the size of the research group). In some cases, you will see that this score can be >10-20. Therefore, when assessing a senior scientist, it is not advisable to use the H-index, the total citation counts and similar aggregation-based metrics. In such a case, it is more appropriate to use average scores such as AVG ARTICLE SCORE, AVERAGE_YEAR_SCORE, etc.

Why is there a 3-100 PMIDs limit in fCite?

Lower bound limit: If you want to analyse a handful number of papers, just read them.
Upper bound limit: Due limited computer resources (note that fCite is provided for free), we needed to apply a limit of 100 PMIDs per query. Simply divide your list into 100-PMID parts if needed.

How should one interpret AVG_YEAR_SCORE?

AVG_YEAR_SCORE is the fractional score (FLAE_RCR) from all publications divided by the number of years. It gives an estimate of the yearly yield of the scientist.

How should one interpret AVG ARTICLE SCORE?

AVG_ARTICLE_SCORE is a fractional score (FLAE_RCR) for single publications (FLAE_RCR divided by the number of publications). It is a highly useful metric for the assessment of group leaders, as this score reflects the average quality regardless of the group size (in other words, publishing 50 papers per year will not increase this score if they are on the average weak). This score was developed in the quest for preferring quality over quantity.

How should one interpret AVG ARTICLE IMPACT per YEAR?

AVG ARTICLE IMPACT per YEAR is the yearly impact of articles published per year (AVG_ARTICLE_SCORE * average number of papers per year). This score partly captures the whole research group’s impact and can be severely distorted by group size.

FLAE/RCR aka average percentage contribution in a single paper

FLAE/RCR is a measure of percentage contribution towards the publication. It represents how much a given author contributed, on average, to the content of his/her articles. This score is tightly connected with the average number of authors, and finally, it can be used to assess how much the author contributed relative to other authors (using so-called 'Expected FLAE/RCR').

Author's contribution to his/her publications and expected FLAE/RCR

expected FLAE/RCR is a measure of the percentage contribution to the publication under an equal contribution (EC) model. It shows the expected contribution. Comparing 'FLAE/RCR' with 'Expected FLAE/RCR' can be used to assess how much the given author contributed relative to other authors.

Average contribution to a single paper

Regardless of whether the EC or FLAE models apply in a given field, even the simple division of contributions in terms of the number of authors can be highly useful and informative. This was one of the strongest motivations for creating fCite, to give credit where it is due. In recent years, scientists have developed numerous ways to inflate citation numbers and the H-index. One of the simplest and most commonly used methods is extending the list of authors (so-called "guest" authorship). Some fields (e.g., high-energy physics, medicine) have already "adapted" so well that a publication with >100 authors is considered normal, and "diamond" publications co-authored by >1,000 people have begun to emerge. For instance, every year, a list of so-called "Highly Cited Researchers (HCR list)" is published (currently >3,200 scientists) that tries to identify the most influential scientists. People on the HCR list must have (co)-authored at least 15 highly cited papers and usually have thousands of citations in total. The problem with this list is that it does not take into account the number of authors. Once such author in medicine who published hundreds of publications was analysed using fCite. The result was troubling. While the total RCR score was >1,300 (an impressive score even for a "Highly Cited Researcher"), his fractional contribution was very low (EC=75, FLAE=50). Most important, his average percentage contribution in a single publication was 3.5%, a score that in many fields would be below even the acknowledgment level (for more information, see Examples").

Why is the Impact Factor of a journal not used?

Impact Factor is a score that is based on a simple arithmetic average that has been applied to a non-Gaussian distribution of the citations. As such, it is a completely useless and skewed metric that should never be used. While IF can carry some information about the journal as a whole if the journal is large enough (thousands of publications each year, which is not usually the case), it is meaningless if we consider a single publication by a given author. The fCite philosophy is to reward for citations (which reflect the utility of the research) rather the place where a contribution has been published. Simply speaking, even if a publication was published in the best journal, it means very little until other people can benefit from it (if no one has cited the paper, it is evident that this particular research did not advance science).

Does fCite take into account self-citations?

Unfortunately, no. This is one of the greatest disadvantages of the current version of fCite. Although it is natural for a scientist to refer to his/her past research when he/she continues or extends earlier studies, a self-citation rate of >10-20% can be considered a strong warning (see Fig. 3 from Ioannidis et al. 2016), while in some cases there are "researchers" who have a score above >50%. Including self-citations in the analysis will be a key development in future versions of fCite.

What models are used for dividing citations and RCR?

Currently, fCite supports the following models:

The FLAE model stands for “first-last-author-emphasis” and is based on Tscharntke et al. 2007 definition with slight modifications. The contributions of individual authors can be briefly described as "the first author gets 100, the last 50, and all others 100/number of authors, and then scores are normalized to 1". Alternatively, see the contribution matrix in the raw format
The FLAE2 model is based on Corrêa Jr. et al. 2017. Note that the values for this model are based on Fig. 7 scraped manually from the plot because the authors did not provide quantitative data. Alternatively, see the contribution matrix in the raw format
The FLAE3 model – the contributions of individual authors can be briefly described as "the first author always get at least 3 times more than co-authors, and the last author at least 2 times more than other co-authors". Alternatively, see the contribution matrix in the raw format
The EC model represents an "equal contribution", which means that we assume that each author contributed equally to the given work. Alternatively, see the contribution matrix in the raw format

All of these models can be applied separately to RCR and citation counts.

Which model is the best or which one I should use?

There is no simple answer to such questions. It may depend on the discipline, field and so on. In some fields, the authors' order may be alphabetical (then, only EC model applies). In many fields, it is custom for the first author to be the one who did most of the work (e.g., the paper represents, roughly speaking, his/her PhD thesis on which he/she has spent the last few years almost exclusively), while the last author is the supervisor who got the funding and conceived and designed the project while doing different degrees of experimental work. All other authors are considered supporting staff for specialized but necessary steps. In such a situation, the FLAE, FLAE2 and FLAE3 models can be used. It should be kept in mind that none of these models is a substitute for reading the papers and/or asking the author for his/her contribution to a particular work. Therefore, those models can only be considered a very rough approximation of the author's research output, which can vary substantively across papers even if they have the same number of authors. Nevertheless, when dozens or even hundreds of papers are considered, fCite can facilitate the assessment of the portfolio. Moreover, after some practice, you will quickly notice that although the FLAE, FLAE2 and FLAE3 models differ in their details, the scores of all three models are very similar when more than 20 papers are analysed.

What is the difference between the H-index and fH-index?

The fH-index stands for the fractional H-index, and it takes into account citations of the given author using the FLAE model. For instance,

[600, 547, 247, 207, 199, 182, 170, 156, 106, 103, 102, 96, 92, 83, 81, 66, 63, 62, 61, 56, 55, 53, 53, 53, 49, 46, 43, 41, 40, 39, 38, 38, 37, 37, 36, 34, 33, 32, 32, 31, 30, 29, 27, 26, 25, 24, 19, 18, 18, 16, 15, 14, 13, 12, 12, 11, 10, 8, 8, 8, 6, 6, 5, 5, 4, 4, 4, 3, 2]
H-index:    35

[285.72, 107.4, 79.6, 36.0, 23.34, 21.68, 21.06, 20.73, 20.19, 17.61, 14.31, 14.08, 13.48, 13.1, 12.0, 11.51, 10.25, 9.47, 9.27, 9.25, 8.77, 8.73, 7.5, 7.17, 7.14, 6.79, 6.64, 6.54, 5.54, 5.33, 5.33, 4.91, 4.5, 4.42, 4.29, 4.22, 4.08, 3.31, 3.0, 2.87, 2.75, 2.74, 2.71, 2.39, 2.04, 2.03, 2.0, 2.0, 1.9, 1.71, 1.6, 1.56, 1.45, 1.44, 1.42, 1.36, 1.36, 1.33, 1.14, 1.0, 0.96, 0.82, 0.56, 0.5, 0.5, 0.41, 0.26, 0.24, 0.19]
Fractional H-index: 13

Important note: the H-index is a cumulative score that is tightly related to the total number of citations (for details, we strongly encourage reading Yong's paper). Moreover, it is prone to manipulation and inflation (guest authorship, multiple authors, increasing the size of the research group). While the fH index is more robust to those disadvantages, it is still cumulative, and additionally, its distribution is even flatter; thus, it has relatively low discriminatory power (many scientists will have a score of 1-10 and moving from, e.g., 5 to 6 can take years).

Instead of using the fH-index only, we highly encourage the analysis of the portfolio of papers using all metrics provided by fCite for discovering all pros and cons of the researcher's output rather than summarizing him/her in a single, oversimplified number (e.g., the H-index, sum of IFs, which are commonly used today)

What is the difference between the M-index and fM-index?

The M-index is the H-index divided by the number of active years (the number of years since the first paper). Consequently, the fM-index is a fractional version of the M-index.

What database is used by fCite?

fCite is based on the PubMed database (via iCite; 17,575,841 publications). fCite is updated monthly at the beginning of the month.

Why can the results differ between ORCID and PMIDs?

You can provide either a list of PMIDs extracted manually from PubMed (see the Help section for details) or ORCID id. Note that in most cases, a list of PMIDs will be more accurate. The public profile data for ORCID users was acquired on 27/10/2018. Therefore, they will not contain any newer data. Additionally, most ORCID profiles contain only DOI identifiers, which had to be mapped to PMIDs (this process is not always possible for technical reasons). Moreover, the data provided by ORCID users regarding the publication are usually limited/partial and do not reflect their full portfolio.

Why do you need to provide all combinations of first, middle, and last/family/surname(s) for a given author?

At first glance, it looks to be an easy task to automate, but there are a number of non-trivial cases that, in practice, make doing so almost impossible. There are many exceptions, such as only initials being given, the second name being added or omitted, non-standard characters being used or even people changing their surnames (e.g., women after marriage), among many other factors. Therefore, you need to do it yourself, but the fCite interface should make this almost painless.

What is the RCR score?

The Relative Citation Ratio (RCR) is a metric developed by the iCite team that tries to normalize citations for different fields and subfields of science. It has been developed for grant assessment by NIH teams. If the score is equal to 1, it means that the paper attracted a similar number of citations in a given time span as other papers in the field. For more details about RCR, see the iCite site.

What is the FLAE_RCR score?

FLAE_RCR is a metric derived from RCR by dividing the latter by the number of authors according to the FLAE model. It is a quality metric that measures individual researcher output (one of the most powerful metrics fCite provides). Most important, it can also decrease when the researcher starts to be less active or/and the research becomes outdated (this is in sharp contrast to the H-index and the total number of citations, which can only increase over time). According to the limited analysis of >800 researchers (unpublished results), FLAE_RCR can be summarized as follows:

FLAE    total RCR
0-2          0-10      PhD student or below (beginning of the career)
2-6          5-20      PostDoc (note that aggregating an FLAE of 5 can take even 10-20 years)
5-15        10-40      young PI (starting as group leader of a small group of researchers)
10-100                 mature PI
100-200                the best scientists in the world, the first class
>500                   extraordinary scientists who have transformed the world, not more than a few living persons

Important note: Even though FLAE_RCR is a relatively strong metric, it should not be used as the only metric to describe researcher output. You should always check all metrics that fCite provides. For instance, many Nobel prize laureates have FLAE_RCR score as "low" as 50-80, but thanks to the analysis with the other metrics you will immediately see that such persons usually lead a small, focused research group and publish only a few papers per year (or fewer). Even though those papers are quality papers cited by thousands of people (additionally, they are not middle author papers; frequently, those are the papers that date back to their postdoc or PhD period in which they were the first authors). Accordingly, FLAE_RCR can be inflated (but much less easily than the H-index for instance) by the size of the group, multiple authorship, guest authorship, or by forming unjustified big consortia, and citation cartels.

Additionally, FLAE_RCR (or FLAE_Cit) alone cannot be used as the ultimate metric to quantify research output. A researcher’s output and his/her position in the scientific community depend on multiple factors that cannot currently be captured by fCite (e.g., the country, the affiliated university, the current position, the past collaborators, the size of the field, the authorship patterns).

What is the FLAE_cit score?

It is a fractional citation score. For the interpretation see the section about FLAE_RCR and extrapolate.

What is the total citation score?

It is the total number of citations in PUBMED (the score corresponds to the total citations count in Google Scholar, Scopus and similar resources, but usually due the smaller PUBMED corpus, it will be lower). All authors obtain full credit regardless of the number of authors.

How should one interpret the percentiles provided for some metrics?

fCite provides estimations for some metrics (e.g., FLAE_RCR) to help explain their meaning (based on distributions of those scores calculated from ORCID data).

Let us consider the total RCR score of a research portfolio consisting of 12 publications. Assume that we obtained a score of 10. The first idea is to compare this score with the scientist’s peers in the field, preferably those at a similar career level. After running fCite multiple times, you will get a good feeling for whether a score of 10 is good or not. At this point, fCite provides some help. First, it shows the percentile of the score in comparison to ORCID users with three or more papers (this provides a baseline for how meaningful the score is in comparison to other researchers). Nevertheless, you should remember that the bibliometric scores do not aggregate linearly over time; thus, you should not expect that any score calculated from 12 and 24 papers will have RCRs of 10 and 20, respectively. Moreover, when there are >20-50 papers in the portfolio, it is obvious that in most cases, we are analysing a senior researcher and/or group leader. Therefore, you should not compare most of the metrics of a junior researcher with 5-10 papers with a senior who has 20-50 papers, as in the latter case the person is in charge of substantial amount of human and time resources relative to the former.

Why do I see so few citations versus Web of Science, Scopus or Google Scholar?

fCite is based on PubMed, which in many sub-fields represents a smaller number of indexed journals (and therefore publications), and as a result, you observe a lower citation count. Nevertheless, until PubMed can be considered a good representation of the most important journals in your field (e.g., life sciences, medicine, physics, biology) and until your comparison is restricted to fCite (i.e., you do not mix citations for the researchers across databases), fCite can be considered a comprehensive tool with comparable results (again a number of restrictions apply, e.g., you should not compare a plant biologist with a cancer medicine doctor, see Examles for more). In short, the total number of citations is relative to the database considered, and it should never be used as an absolute impact of the researcher's output (the same applies to the H-index).

Where I can download the data for the presented plots?

fCite statistics are based on iCite and ORCID. You can download the bulk files that were used for the calculation of the plots, percentiles and other statistics presented here at the links below:

iCite: PMIDs_7M-19M_122018 and PMIDs_20M-32M_122018
ORCID profiles: figshare

Note that those links contain GBs of data spread across millions of files; thus, significant resources and skills will be required to process them.